Creating a visualisation to show the average rating and proportion of cocoa percent (% chocolate) greater than or equal to 70% by top 15 company location.
In this take-home exercise, we aim to apply the appropriate data visualisation techniques to create a data visualisation showing the average rating and proportion cocoa percent (% chocolate) great than or equal to 70% by top 15 company location through the use of ggplot2 methods.
The chocolate.csv was used to show the average rating and proportion of cocoa percent (% chocolate) greater or equal to 70% by top 15 company location.
The code chunk below was used to import the necessary packages to create the visualisation.
packages = c('ggstatsplot', 'ggside', 'knitr',
'tidyverse', 'broom', 'ggdist', 'dplyr','plotly','DT','crosstalk')
for (p in packages){
if(!require(p, character.only = T)){
install.packages(p)
}
}
Step 1: Isolate columns needed (i.e. company_location, rating and cocoa_percent) Step 2: Remove “%” from cocoa_percent and convert to numeric.
choco <- read_csv("data/chocolate.csv")
choco$cocoa_percent <- gsub(pattern = "%", replacement = "", x = choco$cocoa_percent) %>% as.numeric(choco$cocoa_percent)
##subsetting the isolated columns
chocodf <- choco %>% select(company_location, rating, cocoa_percent)
##convert rating to numeric
chocodf$rating <- as.numeric(chocodf$rating)
proportion cocoa percent (% chocolate)
ggplot(avg_rating_top15) +
geom_errorbar(
aes(x=reorder(company_location,-n,),
ymin=mean-1.98*se,
ymax=mean+1.98*se),
width=0.2,
colour="black",
alpha=0.9,
size=0.5) +
geom_point(aes
(x=company_location,
y=mean),
stat="identity",
color="red",
size = 1.5,
alpha=1) +
xlab("Company Location") +
ylab("Average Rating") +
ggtitle("Standard error of mean rating of top 15 companies (based on frequency)") +
scale_x_discrete(guide = guide_axis(n.dodge = 2))
ggplot(avg_percent_top15) +
geom_errorbar(
aes(x=reorder(company_location,-n,),
ymin=mean-1.98*se,
ymax=mean+1.98*se),
width=0.2,
colour="black",
alpha=0.9,
size=0.5) +
geom_point(aes
(x=company_location,
y=mean),
stat="identity",
color="red",
size = 1.5,
alpha=1) +
xlab("Company Location") +
ylab("Average Cocoa Percentage (%)") +
ggtitle("Standard error of mean cocoa percentage of top 15 companies (based on frequency)") +
scale_x_discrete(guide = guide_axis(n.dodge = 2))
We attempt to create an interactive plot to directly compare the two plots to identify trends.
The code chunk below does a left join of the two datasets avg_rating_top15 and avg_percent_top15 to create single dataset for the creation of the visualisation. The merge() functiionality is used.
##combining the two datasets
forggplotly <- merge(x=avg_rating_top15, y = avg_percent_top15, by = "company_location", all.x =TRUE)
d <- highlight_key(forggplotly)
#rating (x), percent (y)
p1<- ggplot(d) +
geom_errorbar(
aes(x=reorder(company_location,-n.x,),
ymin=mean.x-1.98*se.x,
ymax=mean.x+1.98*se.x),
width=0.2,
colour="black",
alpha=0.9,
size=0.5) +
geom_point(aes
(x=company_location,
y=mean.x),
stat="identity",
color="red",
size = 1.5,
alpha=1) +
xlab("Company Location") +
ylab("Average Rating") +
theme(axis.text.x = element_text(angle = 45, size = 10)) +
ggtitle("Standard error of mean rating of top 15 companies (based on frequency)")
p2 <-ggplot(d) +
geom_errorbar(
aes(x=reorder(company_location,-n.y,),
ymin=mean.y-1.98*se.y,
ymax=mean.y+1.98*se.y),
width=0.2,
colour="black",
alpha=0.9,
size=0.5) +
geom_point(aes
(x=company_location,
y=mean.y),
stat="identity",
color="red",
size = 1.5,
alpha=1) +
xlab("Company Location") +
ylab("Average Cocoa Percentage (%)") +
theme(axis.text.x = element_text(angle = 45, size = 10)) +
ggtitle("Standard error of mean cocoa percentage of top 15 companies
(based on frequency)")
gg1 <- ggplotly(p1)
gg2 <- ggplotly(p2)
crosstalk::bscols(gg1,
gg2,
widths = 12)